Text Tower Refactor #185

rwightman · 2022-10-04T21:22:03Z

A refactor of #178 that will keep backwards compat for existing weights/models but provide a different base model for new models using custom text towers...

…ce only transform. Re #168

…, in webdataset dataloader

… text_tower_refactor

src/open_clip/factory.py

rom1504 · 2022-10-16T19:41:28Z

would be nice to finish this sometime @iejMac @rwightman so we can have some multilingual clip in the future

rom1504 · 2022-10-16T19:43:26Z

code looks good to me

what do we want to validate here?

Maybe something like:

exact inference result before/after
can still load with torch.load
training works the same before after

Anything else ?

rom1504 · 2022-10-16T19:43:48Z

ideally we could validate that with unit tests, but we could also do one shot testing

rwightman · 2022-10-17T03:38:40Z

@rom1504 yeah, think those tests are reasonable, I already checked some pretrained zero-shot runs. Maybe an epoch of train on a smaller model we have some recent results for... ?

lopho · 2022-10-17T14:34:35Z

loading openai weights now fails due to a change in model keys.
specifically the text. prefix is now gone from the text model. ex. vit-l-14:

Missing key(s) in state_dict: "positional_embedding", "text_projection", "transformer.resblocks.0.ln_1.weight", ...

Unexpected key(s) in state_dict: "text.positional_embedding", "text.text_projection", "text.transformer.resblocks.0.attn.in_proj_weight", ...

loading using the factory funcion open_clip.create_model_and_transforms and open_clip.create_model

LAION weights still load fine, and inference on text and image is the same as on the main branch. (EDIT) tested with vit-l-14 laion2b_s32b_b82k and random noise as image input and random strings 1-5 words each 1-10 symbols.
I would recommed a proper integration test over a larger input space.

rom1504 · 2022-10-17T20:14:12Z

@lopho if you're interested in writing more automated tests, it would definitely be appreciated

lopho · 2022-10-18T00:35:38Z

I can write tests, but at the soonest at the end of the week / weekend.
To be clear, tests to compare outputs over commits as part of the CI to catch regressions as well as for loading already published models. If I have the time I could take a look at tests for training, maybe.

…_refactor

…tom text model w/ existing models (for testing).

rwightman · 2022-11-01T05:44:33Z

@rom1504 @lopho @iejMac

I fixed some issues and pulled in some other changes so that the testing is done once

Loading OpenAI checkpoints was actually broken as the conversion was left in, fixed
I fixed pure float16 (and added pure bfloat16) support, including for OpenAI weights path (torchscript and non-jit). This has come up a few times, was broken to make AMP work for train.
Added a --force-custom-text to enable the custom text base model to test the dual tower configs for eval and train

rom1504 · 2022-11-01T21:59:36Z

Do you think it's ready now Ross ?

rom1504 · 2022-11-01T22:08:36Z

Code looks fine to me

Ideally we would have unit test (for inference and training) or at least check things once

But also I think it's really safe at this point, we could probably decide to just merge to avoid blocking things further

lopho · 2022-11-01T22:19:31Z

Yeah, sorry I haven't found the time yet to write tests.
When I come around, I will try to test premerge and merge of this branch manually.

rwightman · 2022-11-01T22:25:34Z

@rom1504 I've tested quite a number of zero-shot eval models w/ different precision, torchscript, custom tower enabled/disabled, and did pure bf16 train locally on 1 GPU for 1 cc12m epoch. Need to test the ResNet and timm models to see if they still work (eval for resnet, train an epoch for timm)

Should test training on a cluster w/ proper dataset, maybe few epochs of a 'B' model, not sure which cluster should use / if we have anything in a good state :/ @JeniaJitsev ?

JeniaJitsev · 2022-11-02T12:31:53Z

@rwightman @rom1504 JUWELS Booster can handle runs with 32 nodes / 128 GPUs so far very robustly, no problems observed so far at all. So for testing models of 'B' scale, we are surely fine to go.

…ale. No bias on timm model projection by default. Fix null pretrained arg handling.

rom1504 · 2022-11-03T07:44:16Z

I think it's not a good idea to keep pushing more unrelated changes here.
In case we find after merging that this PR is breaking things, it will be difficult to revert.

Let's not add anything more and first validate and merge ?

rom1504 · 2022-11-03T10:24:54Z

#198 created that issue to track adding automated regression tests
Can be done after this is merged, but it will help make it much easier to merge PRs in the future

rwightman · 2022-11-03T15:40:18Z

I think it's not a good idea to keep pushing more unrelated changes here. In case we find after merging that this PR is breaking things, it will be difficult to revert.

Let's not add anything more and first validate and merge ?

I understand the sentiment, but with limited man power and treat coverage, and demands of verifying training at scale, easy for a feature PR to end up a release branch as I want to ensure everything is tested that's relevant to near term planned runs.

Will keep new commits to bug fix.

rom1504 · 2022-11-03T17:41:25Z

Ok, let's try to validate it asap and merge it so that things get easier.

src/training/main.py

…aming/tense to match other args.

…onfigs for updated table.

rwightman · 2022-11-04T18:56:09Z

I think this is ready to merge, @mitchellnw any concern re your JUWELS run? convnext is running but hasn't been stable, seems like it might be with grad clipping enabled (which is a default for any supervised training of these models)

mitchellnw · 2022-11-04T19:03:26Z

No concern so far, but unfortunately only set time for 24h so am now waiting in queue to resume.

rwightman · 2022-11-04T19:04:28Z

@mitchellnw without a special exception, 24h is the per runtime limit anyways...

rwightman · 2022-11-04T22:43:30Z

k, I'm going to merge this, definitely possibility of some regressions but best to get this structure in main and improve/fix from there. I will hold off on pushing a pypi release until this is on main for a bit with no showstopping wtfs popping up

mitchellnw · 2022-11-05T20:42:37Z

all good on the b/32 run, reached 63.67%

rwightman and others added 20 commits September 16, 2022 21:23

POC interface for create fn with mandatory pretrained arg and inferen…

4ce3926

…ce only transform. Re #168

Text Tower Support

1c8844b

add model.py changes

1e348c8

modified_resnet.py

213e218

update visual transformer to main version

b4df97a

init_params

9cf309e

update to main

f454a18

import math

f474a2a

remove print

65a14a9

convert state dict

b3a3718

comment above conver_state_dict

9955855

update to main

2415600

comment in factory

dc9fc01

Add a note on how to do smaller epochs. Fix #135

15daeb7

Recommend img2dataset in readme, fix #148

a618008

filter examples with no images, in addition to those with no captions…

75a009b

…, in webdataset dataloader

Add jit=True to check we don't break torchscript

aa71712

Test both jit True and False

c849dee

Merge branch 'text_tower' of https://github.com/iejMac/open_clip into…

044c30d

… text_tower_refactor

Refactor custom text tower into separate model w/ code re-use reduced.

2c3d86e

rom1504 reviewed Oct 4, 2022

View reviewed changes

src/open_clip/factory.py Show resolved Hide resolved

rwightman added 2 commits October 31, 2022 10:53

Remove text tower conversion for OpenAI weight loads

d79c5c2

Merge remote-tracking branch 'origin/from_pretrained' into text_tower…

3e76bca

…_refactor

Fixing float16/bfloat16 (pure) modes, adding flag to force use of cus…

c4190d2

…tom text model w/ existing models (for testing).

Add/remove some model configs. Add profiler. Add support for layer_sc…

90a890f

…ale. No bias on timm model projection by default. Fix null pretrained arg handling.

mitchellnw force-pushed the text_tower_refactor branch from 9093d5e to 90a890f Compare November 3, 2022 06:19

rom1504 reviewed Nov 3, 2022

View reviewed changes

src/training/main.py Outdated Show resolved Hide resolved

rwightman added 3 commits November 3, 2022 12:26

Remove save that was for openai checkpoint tests

0fd8534

Fix grad checkpoing for timm models (bug). Change grad clipping arg n…

7e5546d

…aming/tense to match other args.

Tweak profile script, fix a bug for resnet models, add G/e/S-32-alt c…

e5a92e2

…onfigs for updated table.

Bump version to 2.1.0

c119c01

rwightman merged commit 55174d7 into main Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Tower Refactor #185

Text Tower Refactor #185

rwightman commented Oct 4, 2022

rom1504 commented Oct 16, 2022

rom1504 commented Oct 16, 2022 •

edited

Loading

rom1504 commented Oct 16, 2022

rwightman commented Oct 17, 2022

lopho commented Oct 17, 2022 •

edited

Loading

rom1504 commented Oct 17, 2022

lopho commented Oct 18, 2022

rwightman commented Nov 1, 2022

rom1504 commented Nov 1, 2022

rom1504 commented Nov 1, 2022

lopho commented Nov 1, 2022

rwightman commented Nov 1, 2022

JeniaJitsev commented Nov 2, 2022

rom1504 commented Nov 3, 2022

rom1504 commented Nov 3, 2022

rwightman commented Nov 3, 2022

rom1504 commented Nov 3, 2022

rwightman commented Nov 4, 2022

mitchellnw commented Nov 4, 2022

rwightman commented Nov 4, 2022

rwightman commented Nov 4, 2022

mitchellnw commented Nov 5, 2022

Text Tower Refactor #185

Text Tower Refactor #185

Conversation

rwightman commented Oct 4, 2022

rom1504 commented Oct 16, 2022

rom1504 commented Oct 16, 2022 • edited Loading

rom1504 commented Oct 16, 2022

rwightman commented Oct 17, 2022

lopho commented Oct 17, 2022 • edited Loading

rom1504 commented Oct 17, 2022

lopho commented Oct 18, 2022

rwightman commented Nov 1, 2022

rom1504 commented Nov 1, 2022

rom1504 commented Nov 1, 2022

lopho commented Nov 1, 2022

rwightman commented Nov 1, 2022

JeniaJitsev commented Nov 2, 2022

rom1504 commented Nov 3, 2022

rom1504 commented Nov 3, 2022

rwightman commented Nov 3, 2022

rom1504 commented Nov 3, 2022

rwightman commented Nov 4, 2022

mitchellnw commented Nov 4, 2022

rwightman commented Nov 4, 2022

rwightman commented Nov 4, 2022

mitchellnw commented Nov 5, 2022

rom1504 commented Oct 16, 2022 •

edited

Loading

lopho commented Oct 17, 2022 •

edited

Loading